Core: Add position and equality delta writer interfaces#3176
Core: Add position and equality delta writer interfaces#3176rdblue merged 2 commits intoapache:masterfrom
Conversation
| import org.apache.iceberg.deletes.PositionDelete; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
|
|
||
| public class BasePositionDeltaWriter<T> implements PositionDeltaWriter<T> { |
There was a problem hiding this comment.
This is the writer to use for Spark merge-on-read.
| } | ||
|
|
||
| @Override | ||
| public WriteResult result() { |
There was a problem hiding this comment.
I am using the existing WriteResult that @openinx created. It has a builder and already takes care converting values to arrays for serialization.
| * @param spec a partition spec | ||
| * @param partition a partition or null if the spec is unpartitioned | ||
| */ | ||
| void deleteKey(T key, PartitionSpec spec, StructLike partition); |
There was a problem hiding this comment.
@rdblue, this is directDelete you mentioned. Also added docs about the schema expectations.
| * | ||
| * @param <T> the row type | ||
| */ | ||
| public interface EqualityDeltaWriter<T> extends Closeable { |
There was a problem hiding this comment.
This one will be implemented by the CDC writer that I will submit in a separate PR. It is large.
| * @param spec a partition spec | ||
| * @param partition a partition or null if the spec is unpartitioned | ||
| */ | ||
| void delete(CharSequence path, long pos, T row, PartitionSpec spec, StructLike partition); |
There was a problem hiding this comment.
@rdblue, I kept the optional row before spec and partition. In most new APIs, spec and partition are the last arguments so even though row is an optional argument, having spec and partition as last seems more consistent.
| /** | ||
| * Deletes a key from the provided spec/partition. | ||
| * <p> | ||
| * This method assumes the delete key schema matches the equality field IDs. |
There was a problem hiding this comment.
I don't know what this means :)
There was a problem hiding this comment.
I'll try to rephrase it then :)
| void insert(T row, PartitionSpec spec, StructLike partition); | ||
|
|
||
| /** | ||
| * Deletes a position in the provided spec/partition without persisting the old row. |
There was a problem hiding this comment.
I'm not sure I understand what this one means either, the old row is the original row matching the position of this delete file? Why would I be persisting it?
There was a problem hiding this comment.
The spec allows us to persist the deleted row in positional delete files. This may be helpful to reconstruct CDC records or to persist the sort key for min/max filtering.
That being said, I don't plan to persist it from Spark.
There was a problem hiding this comment.
ah so "Delete a position and record the deleted row in the delete file" vs "Delete a position"
There was a problem hiding this comment.
Yeah, that's a good way to put it. I'll update.
| } | ||
|
|
||
| /** | ||
| * Deletes a position in the provided spec/partition and persists the old row. |
There was a problem hiding this comment.
Same question as last javadoc
RussellSpitzer
left a comment
There was a problem hiding this comment.
Other than Java Doc comments I think this is good to go, I would maybe add tests for an "all delete" and "all insert" operation just to cover those edge cases but the api looks good to me now that @aokolnychyi explained the purposes :)
|
Updated the Javadoc and also added tests for delete and insert only cases. |
|
Looks great. Thanks for getting these in, @aokolnychyi! |
|
Thanks for reviewing, @rdblue @RussellSpitzer! |
This PR adds position and equality delta writer interfaces and contains a subset of changes in PR #2945.